An E cient Compiler for Weighted Rewrite Rules
نویسندگان
چکیده
Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as nite-state transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more eecient than existing algorithms. Further, many of our applications demand the ability to compile weighted rules into weighted FSTs, transducers generalized by providing transitions with weights. We have extended the algorithm to allow for this. 1. Motivation Rewrite rules are used in many areas of natural language and speech processing, including syntax, morphology, and phonology 1. In interesting applications , the number of rules can be very large. It is then crucial to give a representation of these rules that leads to eecient programs. Finite-state transducers provide just such a compact representation (Mohri, 1994). They are used in various areas of natural language and speech processing because their increased computational power enables one to build very large machines to model interestingly complex linguistic phenomena. They also allow algebraic operations such as union, composition, and projection which are very useful in practice (Berstel, 1979; Eilen-berg, 1974 1976). And, as originally shown by Johnson (1972), rewrite rules can be modeled as 1 Parallel rewrite rules also have interesting applications in biology. In addition to their formal language theory interest, systems such as those of Aristid Lin-denmayer provide rich mathematical models for biological development (Rozenberg and Salomaa, 1980). nite-state transducers, under the condition that no rule be allowed to apply any more than a nite number of times to its own output. Kaplan and Kay (1994), or equivalently Kart-tunen (1995), provide an algorithm for compiling rewrite rules into nite-state transducers, under the condition that they do not rewrite their non-contextual part 2. We here present a new algorithm for compiling such rewrite rules which is both simpler to understand and implement, and computa-tionally more eecient. Clarity is important since, as pointed out by Kaplan and Kay (1994), the representation of rewrite rules by nite-state transducers involves many subtleties. Time and space eeciency of the compilation are also crucial. Using naive algorithms can be very time consuming and lead to very large machines (Liberman, 1994). In some applications such as those related to speech processing, one needs to use weighted rewrite rules, namely rewrite rules to which weights are associated. …
منابع مشابه
An Efficient Compiler for Weighted Rewrite Rules
Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finite-state transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more efficient than existing algorith...
متن کاملAxioms as generic rewrite rules in C++ with concepts
Compilers are typically hardwired to attempt many optimizations only on expressions that involve particular built-in types. Ideally, however, an optimizing compiler would recognize a rewrite opportunity for user-defined types as well, whenever the operands of an expression satisfy the algebraic properties that justify the rewrite. This paper applies the principles and techniques of generic prog...
متن کاملA Bimachine Compiler for Ranked Tagging Rules
This paper describes a novel method of compiling ranked tagging rules into a deterministic nite-state device called a bimachine. The rules are formulated in the framework of regular rewrite operations and allow unrestricted regular expressions in both left and right rule contexts. The compiler is illustrated by an application within a speech synthesis system.
متن کاملWeighted Grammar Tools: the Grm Library
We describe the algorithmic and software design principles of a general grammar library designed for use in spoken-dialogue systems, speech synthesis, and other speech processing applications. The library is a set of general-purpose software tools for constructing and modifying weighted finite-state acceptors and transducers representing grammars. The tools can be used in particular to compile ...
متن کاملCompilation of Weighted Finite-State Transducers from Decision Trees
We report on a method for compiling decision trees into weighted finite-state transducers. The key assumptions are that the tree predictions specify how to rewrite symbols from an input string, and the decision at each tree node is stateable in terms of regular expressions on the input string. Each leaf node can then be treated as a separate rule where the left and right contexts are constructa...
متن کامل